Unsupervised Vocabulary Adaptation for Morph-based Language Models

نویسندگان

  • André Mansikkaniemi
  • Mikko Kurimo
چکیده

Modeling of foreign entity names is an important unsolved problem in morpheme-based modeling that is common in morphologically rich languages. In this paper we present an unsupervised vocabulary adaptation method for morph-based speech recognition. Foreign word candidates are detected automatically from in-domain text through the use of letter n-gram perplexity. Over-segmented foreign entity names are restored to their base forms in the morph-segmented in-domain text for easier and more reliable modeling and recognition. The adapted pronunciation rules are finally generated with a trainable grapheme-tophoneme converter. In ASR performance the unsupervised method almost matches the ability of supervised adaptation in correctly recognizing foreign entity names.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised topic adaptation for morph-based speech recognition

Topic adaptation in automatic speech recognition (ASR) refers to the adaptation of language model and vocabulary for improved recognition of in-domain speech data. In this work we implement unsupervised topic adaptation for morph-based ASR, to improve recognition of foreign entity names. Based on first-pass ASR hypothesis similar texts are selected from a collection of articles, which are used ...

متن کامل

Unsupervised morph segmentation and statistical language models for vocabulary expansion

This work explores the use of unsupervised morph segmentation along with statistical language models for the task of vocabulary expansion. Unsupervised vocabulary expansion has large potential for improving vocabulary coverage and performance in different natural language processing tasks, especially in lessresourced settings on morphologically rich languages. We propose a combination of unsupe...

متن کامل

Explorer Unsupervised cross - lingual speaker adaptation for HMM - based speech synthesis

In the EMIME project, we are developing a mobile device that performs personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. We integrate two techniques, unsupervised adaptation for HMM-based TTS using a wordbased large-vocabulary continuous speech recognizer...

متن کامل

Unsupervised Language Model Adapt Transcriptio

Unsupervised adaptation methods have been applied successfully to the acoustic models of speech recognition systems for some time. Relatively little work has been carried out in the area of unsupervised language model adaptation however. The work presented here uses the output of a speech recogniser to adapt the backoff n-gram language model used in the decoding process. We report results for t...

متن کامل

Unsupervised adaptive speech technology for limited resource languages: a case study for Tamil

This paper evaluates adaptive speech technology for creating low cost, rapidly deployable speech recognizers for new languages with very limited data. A multi-modal (speech and touch) dialog system in Tamil, which delivered agricultural information to rural villagers, is described. Based on the field recordings from this system, a number of automatic speech recognition (ASR) adaptation techniqu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012